Workflows Demo Session

Data Science Workflows

Dr Zak Varty

Session Outline

  1. Course Structure, Admin and Expectations

  2. Introductions and Definitions

  3. Markdown, Git and Quarto

Course Structure, Admin and Expectations

Weekly Checklists

  • Videos (& Notes)
  • Readings
    • core, reference, materials of interest
  • Activities
    • short, core, bonus

Weekly Live Sessions

Mix of lecture, lab and group discussion.

  • Mondays: Demo Sessions
    • Interactive
    • Demonstration focused
    • This week a bit of an exception.
  • Fridays: Live Sessions
    • Week in review
    • Task review
    • Group discussion
    • Additional activities

How to get help?

  • Ed Discussion Forum

  • Drop in Sessions: Fridays 15:00-17:00, Huxley 711C.

  • Office Hours: Mondays 15:00 - 16:00, Huxley 6M20.

Asking for help reprex

Self-contained, text file, as small & simple as possible. Consider a Gist.

Assessments

Assessment 1 (30%)

  • Similar to last year, reproducible data journalism task.
  • Week 6 (2024-02-12) Mon 09:00 - Fri 13:00

Assessment 2 (70%)

  • Create a data product of your own devising.
  • Full details released in week 4, due after exam period. (2024-01-29 until 2024-05-15 at 13:00)

Introductions

About Me

  • Maths + Stats + OR
  • Structured / missing data problems
  • Environmental and Industrial applications
  • R, LaTeX, git, Rmarkdown, C++, Python, SQL, Quarto, Julia

About You - Menti

Which have you heard of before?

Which do you have experience of using?

About Data Science - Menti

What is data science to you and how does it relate to statistics?

What do you hope to get out of a data science course?

What will we cover?

  1. Workflows
  2. Acquiring and Sharing Data
  3. Wrangling and Visualisation
  4. Preparing for production
  5. Data Science Ethics

Why R?

As a first language

  • Open-source, high-level,
  • Documentation
  • Generality vs Specificity (Uncertainty built in)
  • Community, beginner friendly material

As a second + language

  • Critique and improve your first language
  • Natural vectorisation and Linear Algebra
  • Easy interface to low-level languages
  • dplyr transition to SQL

Markdown, Quarto and Git

Intro to md, quarto and git

Before Friday:

  • Install/Update R, RStudio and Quarto.

  • Follow Happy git with R instructions to link RStudio and Git and Github (§1-14).

  • Write a username/README.md introducing yourself on your GitHub profile and share in the discussion forum. (See e.g.: StatsRhian, nrennie)